Querying data from SDSS

The Sloan Digital Sky Survey (SDSS) is one the most influential surveys performed so far. Here we will learn how to get data from SDSS: nice jpg images, science images (in FITS format) and spectra (also stored as FITS).

We will recycle info from the following tuturials:

http://astropy-tutorials.readthedocs.io/en/latest/rst-tutorials/Coordinates-Intro.html
http://www.astropy.org/astroquery/


In [ ]:
# Required to see plots when running on mybinder
import matplotlib 
matplotlib.use('Agg')

import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline 

# Python standard-libraries to download data from the web
from urllib.parse import urlencode
from urllib.request import urlretrieve

#Some astropy submodules that you know already
from astropy import units as u
from astropy import coordinates as coords
from astropy.coordinates import SkyCoord
from astropy.io import fits


#only here to display images
from IPython.display import Image

# These are the new modules for this notebook
from astroquery.simbad import Simbad
from astroquery.sdss import SDSS

The first thing is getting the coordinates for an object of interest, in this case NCG5406


In [ ]:
galaxy_name = 'NGC5406'
galaxy = SkyCoord.from_name(galaxy_name)

In [ ]:
pos = coords.SkyCoord(galaxy.ra, galaxy.dec, frame='icrs')
print(pos)

We can now get a picture from the SDSS DR12 image service


In [ ]:
im_size = 3*u.arcmin # get a 25 arcmin square
im_pixels = 1024
cutoutbaseurl = 'http://skyservice.pha.jhu.edu/DR12/ImgCutout/getjpeg.aspx'
query_string = urlencode(dict(ra=galaxy.ra.deg,
                              dec=galaxy.dec.deg,
                              width=im_pixels, height=im_pixels,
                              scale=im_size.to(u.arcsec).value/im_pixels))
url = cutoutbaseurl + '?' + query_string

# this downloads the image
image_name = galaxy_name+'_SDSS_cutout.jpg'
urlretrieve(url, image_name)

In [ ]:
Image(image_name) #load the image into the notebook

Now we need to get the identification numbers to grab the data from SDSS


In [ ]:
xid = SDSS.query_region(pos, spectro=True)

In [ ]:
print(xid)

We can finally dowload the data. The spectra in this case.


In [ ]:
spectra = SDSS.get_spectra(matches=xid)
`spectra[0]` stores all the files related to the spectra for the object of interest. This is actually an array of several HDU in the FITS format

In [ ]:
spectra[0]

The spectrum is stored as a table in the second item of the list. That means that we can get the Table doing the following


In [ ]:
spectra_data = spectra[0][1].data

If we pass spectra_data to the interpreter we can see the structure of that table. Pay attention to the field dtype (data type). It tells you the name of the different columns available in that table.

Please also check the documentation so that you can see what are the units https://data.sdss.org/datamodel/files/BOSS_SPECTRO_REDUX/RUN2D/spectra/PLATE4/spec.html


In [ ]:
spectra_data

In [ ]:
plt.plot(10**spectra_data['loglam'], spectra_data['flux'])
plt.xlabel('wavelenght (Angstrom)')
plt.ylabel('flux (nanomaggies)')
plt.title('SDSS spectra of '+galaxy_name)

The fourth record stores the positions of some emission lines


In [ ]:
spectra[0][3].data

In [ ]:
lines = spectra[0][3].data

In [ ]:
lines['LINENAME']

Let's print the wavelenght for some of them:


In [ ]:
for n in ['[O_II] 3727', '[O_III] 5007', 'H_alpha']:
    print(n, " ->", lines['LINEWAVE'][lines['LINENAME']==n])

Overplotting these lines on the spectrum


In [ ]:
plt.plot(10**spectra_data['loglam'], spectra_data['flux'], color='black')
plt.axvline(x=lines['LINEWAVE'][lines['LINENAME']=='[O_II] 3727'], label=r'O[II]', color='blue')
plt.axvline(x=lines['LINEWAVE'][lines['LINENAME']=='[O_III] 5007'], label=r'O[III]', color='red')
plt.axvline(x=lines['LINEWAVE'][lines['LINENAME']=='H_alpha'], label=r'H$\alpha$', color='green')

plt.xlabel('wavelenght (Angstrom)')
plt.ylabel('flux (nanomaggies)')
plt.title('SDSS spectra of '+galaxy_name)
plt.legend()

We can also get the images in the different SDSS bands (u,g,r,i,z)

The documentation describing the imaging data is here: https://data.sdss.org/datamodel/files/BOSS_PHOTOOBJ/frames/RERUN/RUN/CAMCOL/frame.html


In [ ]:
images = SDSS.get_images(matches=xid, band='g')

In [ ]:
image_data =  images[0][0].data

In [ ]:


In [ ]:


In [ ]:
plt.figure(figsize=(10,10))
plt.imshow(image_data)
plt.colorbar()

That wasn't very nice! Where is the galaxy? What happens is that the flux values in some of the pixels are very high compare to the typical flux.

To see something we will create a clipped image, meanign that any pixel with a flux with larger than 1 will be set to 1.


In [ ]:
clipped_image = image_data.copy()
clipped_image[clipped_image>1.0]=1.0

In [ ]:
plt.figure(figsize=(10,10))
plt.imshow(clipped_image)
plt.colorbar()

This look better now. We now see where is the galaxy.

We end now by cropping the image onto the galaxy and taking the logarithm of the flux in the original data without clipping. This should allows us to see the galaxy structture in the image:


In [ ]:
plt.figure(figsize=(10,10))
plt.imshow(np.log10(image_data[125:475,750:1100]))
plt.colorbar()

Exercise 4.1

Identify at least three of the lines seen in the spectrum of NGC5406. Using that information, what is the redshift of this galaxy?


In [ ]:

Exercise 4.2

Download the FITS images in the filters u,g,r,z,i for NGC5406. Plot in five different panels the logarithm of the flux in each of the bands (always cropping around the galaxy). Why do the images look different?


In [ ]:

Exercise 4.3

Compute the radial flux profile of NGC5406. To do that take 10 different lines starting from the center of the galaxy and plot the flux as a function of radius for those ten lines. What kind of function should fit your results?


In [ ]:

Exercise 4.4

Repeat the same producedure of this notebook (including the exercises 4.2 and 4.3) for the galaxy SDSS J013755.71+010004.9.

Why does this galaxy look so different from NGC5406?


In [ ]: